Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters
نویسندگان
چکیده
The steeply growing performance demands for highly power- and energy-constrained processing systems such as end-nodes of the Internet-of-Things (IoT) have led to parallel near-threshold computing (NTC), joining energy-efficiency benefits low-voltage operation with typical systems. Shared-L1-memory multiprocessor clusters are a promising architecture, delivering in order GOPS over 100 GOPS/W energy-efficiency. However, this level computational efficiency can only be reached by maximizing effective utilization elements (PEs) available clusters. Along effort, optimization PE-to-PE synchronization communication is critical factor performance. In article, we describe light-weight hardware-accelerated unit (SCU) tightly-coupled processors. We detail which enables fine-grain per-PE power management, its integration into an eight-core cluster RISC-V To validate effectiveness proposed solution, implemented advanced 22 nm FDX technology evaluated tunable microbenchmarks set real-life applications kernels. solution allows synchronization-free regions small 42 cycles, 41× smaller than baseline implementation based on fast test-and-set access L1 memory when constraining 10 percent overhead. When DSP-applications, SCU improves up 92 23 average energy 98 39 average.
منابع مشابه
Efficient ICCG on a Shared Memory Multiprocessor
In this paper we discuss different approaches for exploiting parallelism in the ICCG method for solving large sparse symmetric positive ,lefinite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector prodm:ts are explored. Three methods for scheduling the tasks in solving triangular systems are impleme...
متن کاملEfficient Architectural Support for Secure Bus-Based Shared Memory Multiprocessor
Tamper-evident and tamper-resistant systems are vital to support applications such as digital right management and certified grid computing. Recently proposed schemes, such as XOM and AEGIS, assume trusting processor state only to build secure systems. Secure execution for shared memory multiprocessor is a challenging problem as multiple devices need to be trusted. In this work, we propose a fr...
متن کاملHardware Supported Synchronization Primitives for Clusters
Parallel architectures with shared memory are well suited to many applications, provided that efficient shared memory access and process synchronization mechanisms are available. When the parallel machine is a cluster with physically distributed memory, software based synchronization mechanisms together with virtual memory infrastructure can implement Software Distributed Shared Memory (S-DSM),...
متن کاملShared Memory Multiprocessor Support for SAC
Sac (Single Assignment C) is a strict, purely functional programming language primarily designed with numerical applications in mind. Particular emphasis is on efficient support for arrays both in terms of language expressiveness and in terms of runtime performance. Array operations in Sac are based on elementwise specifications using so-called With-loops. These language constructs are also wel...
متن کاملShared Memory Synchronization
Synchronization is a fundamental problem in computer science. It is fast becoming a major performance and design issue for concurrent programming on modern architectures, and for the design of concurrent and distributed systems. In this survey, I have tried to gently introduce the reader to some of the most fundamental issues and classical results underlying the design of concurrent systems, so...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2021
ISSN: ['1045-9219', '1558-2183', '2161-9883']
DOI: https://doi.org/10.1109/tpds.2020.3028691